A Novel Feature Selection Based on One-Way ANOVA F-Test for E-Mail Spam Classification
نویسندگان
چکیده
Spam is commonly defined as unwanted e-mails and it became a global threat against e-mail users. Although, Support Vector Machine (SVM) has been commonly used in e-mail spam classification, yet the problem of high data dimensionality of the feature space due to the massive number of e-mail dataset and features still exist. To improve the limitation of SVM, reduce the computational complexity (efficiency) and enhancing the classification accuracy (effectiveness). In this study, feature selection based on one-way ANOVA F-test statistics scheme was applied to determine the most important features contributing to e-mail spam classification. This feature selection based on one-way ANOVA F-test is used to reduce the high data dimensionality of the feature space before the classification process. The experiment of the proposed scheme was carried out using spam base wellknown benchmarking dataset to evaluate the feasibility of the proposed method. The comparison is achieved for different datasets, categorization algorithm and success measures. In addition, experimental results on spam base English datasets showed that the enhanced SVM (FSSVM) significantly outperforms SVM and many other recent spam classification methods for English dataset in terms of computational complexity and dimension reduction.
منابع مشابه
A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization
Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...
متن کاملA Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملAn Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network
In recent years, there has been considerable interest among people to use short message service (SMS) as one of the essential and straightforward communications services on mobile devices. The increased popularity of this service also increased the number of mobile devices attacks such as SMS spam messages. SMS spam messages constitute a real problem to mobile subscribers; this worries telecomm...
متن کاملA New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection
Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...
متن کاملA Novel One Sided Feature Selection Method for Imbalanced Text Classification
The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...
متن کامل